Skip to content

Comments

Sprint 2 — Rate limiting, leverage safety, Redis kill-switch#17

Merged
cluster2600 merged 9 commits intomainfrom
sprint2/reliability-risk
Feb 18, 2026
Merged

Sprint 2 — Rate limiting, leverage safety, Redis kill-switch#17
cluster2600 merged 9 commits intomainfrom
sprint2/reliability-risk

Conversation

@cluster2600
Copy link
Owner

Sprint 2: Reliability & Risk Management

Issues fixed

Commits

Testing

  • Import/logic checks passed locally
  • All env vars documented in .env.example

Next: Sprint 3

  • Paper trading validation (Binance testnet)
  • Live trading go/no-go decision

maxime added 9 commits February 17, 2026 22:27
…N env var

Hardcoded token 'trading-bot-token' exposed Vault credentials to anyone
with repo read access. Now raises EnvironmentError if VAULT_TOKEN is not set,
forcing operators to explicitly provide the secret at runtime.
…_PASSWORD env var

Hardcoded password 'elvis_password' exposed database credentials in source.
POSTGRES_CONFIG now reads all connection params from environment variables:
  DB_HOST, DB_PORT, DB_USER, DB_PASSWORD (required), DB_NAME
Raises EnvironmentError at startup if DB_PASSWORD is not set.
The Flask API on 0.0.0.0:5050 had no authentication — any host on the
network could read trade data. Added a before_request hook that validates
the X-API-Key header against the API_KEY environment variable on every
request except /health (exempted for load-balancer/Docker health checks).
Returns HTTP 401 on invalid key, HTTP 503 if API_KEY is not configured.
Added three new endpoints to trading/utils/trade_history_api.py:

  POST   /emergency_stop         — Activate kill-switch (halts trading)
  DELETE /emergency_stop         — Clear kill-switch (allow resume)
  GET    /emergency_stop/status  — Read current kill-switch state

All endpoints are protected by the X-API-Key auth added in #13.
The global KILL_SWITCH_ACTIVE flag must be polled by the trading engine
in main.py to honour the stop signal.
Activation is logged at CRITICAL level with timestamp and remote IP.
Added documentation for VAULT_TOKEN (#10), DB_PASSWORD (#11),
and API_KEY (#13) with clear REQUIRED markers and usage notes.
Adds hyphen-variant routes (/emergency-stop, /emergency-stop/status)
as aliases to the existing underscore endpoints, matching the sprint spec.
Adds public is_trading_halted() helper so trading loops can import it:

    from trading.utils.trade_history_api import is_trading_halted
    if is_trading_halted():
        break  # stop all order placement

All routes are protected by the X-API-Key auth from ISSUE #13.
POST returns 200 with JSON confirmation; 409 if already stopped.
DELETE clears the flag; GET /emergency-stop/status is read-only.
…ntial backoff

- Add utils/binance_rate_limiter.py:
  * binance_retry decorator (tenacity): 5 attempts, exponential back-off 1-60 s
  * check_rate_limit_headers(): reads X-MBX-USED-WEIGHT-1M and X-MBX-ORDER-COUNT-1S
    headers; logs utilisation at DEBUG and auto-pauses (5 s default) when >80%
    of the 1200 req/min weight limit is consumed; honours Retry-After header
  * rate_limited_call() convenience wrapper for raw requests.Response callers
  * All thresholds configurable via env vars (BINANCE_RATE_LIMIT_WARN_FRACTION,
    BINANCE_WEIGHT_LIMIT_PER_MIN, BINANCE_RETRY_MULTIPLIER, etc.)
- Wrap get_balance(), get_position(), get_current_price(), set_leverage() in
  binance_executor.py with @binance_retry inner functions
- Add tenacity to requirements.txt
…idation

- config/config.py:
  * DEFAULT_LEVERAGE changed from hardcoded 100 to 3 (read from DEFAULT_LEVERAGE
    env var, defaulting to 3 if unset)
  * Add validate_leverage_config(leverage) function:
    - leverage > 5x  → WARNING log on every startup
    - leverage > 10x → EnvironmentError unless OVERRIDE_HIGH_LEVERAGE=true
    - leverage <= 0  → ValueError immediately

- trading/execution/binance_executor.py:
  * __init__ default_leverage parameter changed from int=100 to int=None
    (resolves from TRADING_CONFIG['DEFAULT_LEVERAGE'] when not explicitly passed)
  * Calls validate_leverage_config() on startup — bot refuses to run with
    leverage > 10x unless operator has set OVERRIDE_HIGH_LEVERAGE=true

- .env.example: document DEFAULT_LEVERAGE, OVERRIDE_HIGH_LEVERAGE, and
  all new Issue #12 / #15 env vars
…starts)

- Replace purely in-memory kill-switch state with Redis-backed persistence:
  * On ACTIVATE  → redis.set('ELVIS_KILL_SWITCH', '1')
  * On DEACTIVATE → redis.delete('ELVIS_KILL_SWITCH')
  * On STARTUP   → reads Redis; if key is present, sets KILL_SWITCH_ACTIVE=True
                   and logs a CRITICAL warning — trading stays halted until
                   a human explicitly clears it via DELETE /emergency_stop

- Add helper functions:
  * _get_redis_client()      — connects using REDIS_HOST/PORT/DB/PASSWORD env vars
  * _redis_set_kill_switch() — writes/deletes key; returns bool (success)
  * _redis_get_kill_switch() — reads key; falls back to in-memory if Redis down

- GET /emergency_stop/status now reads from Redis on every call and syncs the
  in-memory flag (response includes source: 'redis')

- POST /emergency_stop and DELETE /emergency_stop responses now include
  redis_persisted / redis_updated fields to confirm persistence

- Graceful fallback: if Redis is unavailable, behaviour is identical to Sprint 1
  (in-memory only) and a WARNING is logged — bot does not crash

- All Redis connection params controlled by env vars (no hardcoding)
@claude
Copy link

claude bot commented Feb 18, 2026

Claude finished @cluster2600's task —— View job


Code Review Complete ✅

I've completed a comprehensive review of Sprint 2's reliability and risk management features. Here's my detailed feedback:

🎯 Overall Assessment: STRONG

This PR successfully implements three critical reliability features with solid engineering practices. The code demonstrates defensive programming, proper error handling, and good documentation.


Strengths

Issue #12: Rate Limiting Implementation

  • Excellent use of tenacity for exponential backoff retry logic
  • Smart rate limit detection via response headers (X-MBX-USED-WEIGHT-1M)
  • Configurable thresholds via environment variables
  • Automatic pause at 80% capacity to prevent 429 errors
  • Proper exception handling for different Binance SDK versions

Issue #14: Leverage Safety Validation

  • Well-designed validation function with clear warning thresholds:
    • > 5x: Warning logs (good user awareness)
    • > 10x: Requires explicit OVERRIDE_HIGH_LEVERAGE=true override
  • Default reduced from dangerous 100x → safer 3x leverage
  • Validation occurs at startup preventing accidental high-leverage trading
  • Clean separation of concerns in config.py:116-166

Issue #15: Redis Kill-Switch Persistence

  • Robust fallback strategy: Redis → in-memory if Redis unavailable
  • RESTful API design with both /emergency_stop and /emergency-stop endpoints
  • Proper state synchronization between Redis and memory
  • Comprehensive logging with audit trail (IP, timestamp)
  • Process restart survival achieved correctly

⚠️ Areas for Improvement

1. Security Concerns

2. Error Handling Edge Cases

  • Rate Limiter: binance_rate_limiter.py:84-91 - ClientError status code extraction could fail if attribute doesn't exist
# Consider safer approach:
status = getattr(exc, "status_code", None) or getattr(exc, "code", None)

3. Configuration Validation

  • Environment Variables: Missing validation for numeric env vars in binance_rate_limiter.py:42-47
  • Consider adding bounds checking:
_WEIGHT_WARN_FRACTION = max(0.1, min(1.0, float(os.getenv("...", "0.80"))))

4. Testing Gaps

  • No visible unit tests for the new rate limiting logic
  • Kill-switch Redis persistence edge cases not covered
  • Leverage validation boundary conditions need testing

🚀 Performance Considerations

Positive Impact

  • Rate limiting prevents API ban/suspension (major reliability win)
  • Redis persistence adds minimal overhead (~1ms per operation)
  • Exponential backoff reduces server load during issues

Potential Concerns

  • Redis calls in critical trading path (trade_history_api.py:578-584)
  • Consider connection pooling for Redis client reuse
  • Rate limit pause (5s default) may affect high-frequency strategies

🔒 Security Analysis

Good Practices

  • Fail-closed API authentication (blocks when API_KEY not set)
  • No hardcoded secrets (removed from previous versions)
  • Proper input validation in kill-switch endpoints

Recommendations

  • Add request size limits to prevent DoS via large payloads
  • Consider implementing API request logging for security auditing
  • Redis connection should use authentication if running in production

📊 Code Quality

Excellent

  • Clear docstrings and inline comments
  • Consistent error handling patterns
  • Proper type hints in new functions
  • Environment variable configuration with sensible defaults

Minor Issues

  • main.py has grown quite large (1300+ lines) - consider splitting
  • Some magic numbers could be made configurable (trade_history_api.py:231)

🧪 Test Coverage Recommendations

  1. Rate Limiter Tests:

    • Test retry behavior with different exception types
    • Verify exponential backoff timing
    • Test header parsing edge cases
  2. Leverage Validation Tests:

    • Boundary conditions (exactly 5x, 10x)
    • Invalid override values
    • Missing environment variables
  3. Kill-Switch Tests:

    • Redis connection failures
    • Concurrent activation/deactivation
    • State persistence across restarts

🎉 Final Verdict

APPROVAL RECOMMENDED

This PR significantly improves the reliability and safety of the trading system. The implementations are well-engineered with proper error handling and configuration options. The minor issues identified don't block deployment but should be addressed in future iterations.

Priority for follow-up:

  1. Add comprehensive test suite
  2. Split large files for maintainability
  3. Add production Redis authentication
  4. Implement API request auditing

Great work on implementing these critical safety features!


@cluster2600 cluster2600 merged commit 91af761 into main Feb 18, 2026
4 of 8 checks passed
cluster2600 added a commit that referenced this pull request Feb 18, 2026
Sprint 2 — Rate limiting, leverage safety, Redis kill-switch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant